[metrics] Add client metrics and prometheus example#617
[metrics] Add client metrics and prometheus example#617charlesdong1991 wants to merge 2 commits into
Conversation
|
cc @fresh-borzoni @leekeiabstraction PTAL! 🙏 |
leekeiabstraction
left a comment
There was a problem hiding this comment.
TY for the PR. Left a question
| "expected fluss_client_writer_records_send_total >= {rows_per_iter}, got {sent:?}\n{rendered}" | ||
| ); | ||
| println!("self-check OK: records_send_total = {}", sent.unwrap()); | ||
| verified = true; |
There was a problem hiding this comment.
Do we need to break out of the loop once verification is done? As is, the example would run indefinitely (is that your intention?)
There was a problem hiding this comment.
Yes, it was the idea actually, because it's for prometheus demo, so how it usually works is the process should stays alive so people can scrape metric endpoint e.g. something like localhost:xxxx/metrics
if we break out of loop, then endpoint will be taken down, so users cannot scrape metrics anymore...
this verify flag was just a check to confirm recorder is wired up, and not supposed to be used for stop condition. if users really want to stop, usually, they should ctrl-C which also reflects how such prometheus setup is working
i can add a bit more explanation in code to make it clearer if you think makes sense @leekeiabstraction 🙏
There was a problem hiding this comment.
btw, maybe to avoid confusion, examples are not invoked in any CI, so this indefinite loop won't hang CI i believe
There was a problem hiding this comment.
I see, I may have misremembered that some examples are being ran during CI to ensure their correctness. This is fine if it is not ran during CI. If we do that eventually, it might be worth having it run for something like 60s instead.
leekeiabstraction
left a comment
There was a problem hiding this comment.
TY for clarifying. Left a couple more qs.
|
|
||
| let scan_records = log_scanner.poll(Duration::from_secs(1)).await?; | ||
| let mut count = 0; | ||
| for record in scan_records { |
There was a problem hiding this comment.
We are only verifying metrics for sent records, do we need the loop for polling records? If not, it might be worth removing this loop for a more concise example.
| The client caches metric handles the first time a writer or scanner is created, | ||
| binding them to whichever recorder is installed at that moment. Install your global | ||
| recorder **before** calling `FlussConnection::new` (ideally as the very first thing | ||
| in `main`). If you install it after creating a writer or scanner, those metrics will |
There was a problem hiding this comment.
This behaviour might catch user off guard. Is it worth making it so that an Error is returned instead?
Purpose
Document the implemented connection/writer/scanner client metrics and show how to collect them with a Prometheus exporter.
Linked issue: close #618